Search CORE

116 research outputs found

Link prediction in drug-target interactions network using similarity indices.

Author: Guo Yufan
Korhonen Anna
Lu Yiding
Publication venue: BMC Bioinformatics
Publication date: 17/01/2017
Field of study

BACKGROUND: In silico drug-target interaction (DTI) prediction plays an integral role in drug repositioning: the discovery of new uses for existing drugs. One popular method of drug repositioning is network-based DTI prediction, which uses complex network theory to predict DTIs from a drug-target network. Currently, most network-based DTI prediction is based on machine learning - methods such as Restricted Boltzmann Machines (RBM) or Support Vector Machines (SVM). These methods require additional information about the characteristics of drugs, targets and DTIs, such as chemical structure, genome sequence, binding types, causes of interactions, etc., and do not perform satisfactorily when such information is unavailable. We propose a new, alternative method for DTI prediction that makes use of only network topology information attempting to solve this problem. RESULTS: We compare our method for DTI prediction against the well-known RBM approach. We show that when applied to the MATADOR database, our approach based on node neighborhoods yield higher precision for high-ranking predictions than RBM when no information regarding DTI types is available. CONCLUSION: This demonstrates that approaches purely based on network topology provide a more suitable approach to DTI prediction in the many real-life situations where little or no prior knowledge is available about the characteristics of drugs, targets, or their interactions

Springer - Publisher Connector

PubMed Central

Apollo (Cambridge)

Springer OAI

Bimodal network architectures for automatic generation of image annotation from text

Author: Guo Yufan
Gur Yaniv
Madani Ali
Moradi Mehdi
Syeda-Mahmood Tanveer
Publication venue
Publication date: 05/09/2018
Field of study

Medical image analysis practitioners have embraced big data methodologies. This has created a need for large annotated datasets. The source of big data is typically large image collections and clinical reports recorded for these images. In many cases, however, building algorithms aimed at segmentation and detection of disease requires a training dataset with markings of the areas of interest on the image that match with the described anomalies. This process of annotation is expensive and needs the involvement of clinicians. In this work we propose two separate deep neural network architectures for automatic marking of a region of interest (ROI) on the image best representing a finding location, given a textual report or a set of keywords. One architecture consists of LSTM and CNN components and is trained end to end with images, matching text, and markings of ROIs for those images. The output layer estimates the coordinates of the vertices of a polygonal region. The second architecture uses a network pre-trained on a large dataset of the same image types for learning feature representations of the findings of interest. We show that for a variety of findings from chest X-ray images, both proposed architectures learn to estimate the ROI, as validated by clinical annotations. There is a clear advantage obtained from the architecture with pre-trained imaging network. The centroids of the ROIs marked by this network were on average at a distance equivalent to 5.1% of the image width from the centroids of the ground truth ROIs.Comment: Accepted to MICCAI 2018, LNCS 1107

arXiv.org e-Print Archive

Crossref

Unsupervised Declarative Knowledge Induction for Constraint-Based Learning of Information Structure in Scientific Documents

Author: Guo Yufan
Korhonen Anna
Reichart Roi
Publication venue: Transactions of Association for Computational Linguistics
Publication date: 01/03/2015
Field of study

Inferring the information structure of scientific documents is useful for many NLP applications. Existing approaches to this task require substantial human effort. We propose a framework for constraint learning that reduces human involvement considerably. Our model uses topic models to identify latent topics and their key linguistic features in input documents, induces constraints from this information and maps sentences to their dominant information structure categories through a constrained unsupervised model. When the induced constraints are combined with a fully unsupervised model, the resulting model challenges existing lightly supervised featurebased models as well as unsupervised models that use manually constructed declarative knowledge. Our results demonstrate that useful declarative knowledge can be learned from data with very limited human involvement.This is the final published version. It first appeared at https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/472

CiteSeerX

Apollo (Cambridge)

Neural networks for open and closed Literature-based Discovery

Author: Baker Simon
Crichton Gamal
Guo Yufan
Korhonen Anna
Publication venue: PLOS ONE
Publication date: 01/01/2020
Field of study

Funder: Cambridge Commonwealth, European and International Trust; funder-id: http://dx.doi.org/10.13039/501100003343Funder: St. Edmund’s College, University of Cambridge; funder-id: http://dx.doi.org/10.13039/501100005705Literature-based Discovery (LBD) aims to discover new knowledge automatically from large collections of literature. Scientific literature is growing at an exponential rate, making it difficult for researchers to stay current in their discipline and easy to miss knowledge necessary to advance their research. LBD can facilitate hypothesis testing and generation and thus accelerate scientific progress. Neural networks have demonstrated improved performance on LBD-related tasks but are yet to be applied to it. We propose four graph-based, neural network methods to perform open and closed LBD. We compared our methods with those used by the state-of-the-art LION LBD system on the same evaluations to replicate recently published findings in cancer biology. We also applied them to a time-sliced dataset of human-curated peer-reviewed biological interactions. These evaluations and the metrics they employ represent performance on real-world knowledge advances and are thus robust indicators of approach efficacy. In the first experiments, our best methods performed 2-4 times better than the baselines in closed discovery and 2-3 times better in open discovery. In the second, our best methods performed almost 2 times better than the baselines in open discovery. These results are strong indications that neural LBD is potentially a very effective approach for generating new scientific discoveries from existing literature. The code for our models and other information can be found at: https://github.com/cambridgeltl/nn_for_LBD

Directory of Open Access Journals

Apollo (Cambridge)

$\mathcal{N} = 2$ Schur index and line operators

Author: Guo Zhaoting
Li Yutong
Pan Yiwen
Wang Yufan
Publication venue
Publication date: 28/07/2023
Field of study

\mathcal{N} = 2

SCFTs and their invariants can be often enriched by non-local BPS operators. In this paper we study the flavored Schur index of several types of N = 2 SCFTs with and without line operators, using a series of new integration formula of elliptic functions and Eisenstein series. We demonstrate how to evaluate analytically the Schur index for a series of

A_2

class-

\mathcal{S}

theories and the

\mathcal{N} = 4

SO(7) theory. For all

A_1

class-

\mathcal{S}

theories we obtain closed-form expressions for SU(2) Wilson line index, and 't Hooft line index in some simple cases. We also observe the relation between the line operator index with the characters of the associated chiral algebras. Wilson line index for some other low rank gauge theories are also studied.Comment: 72 pages, 9 figures, 5 table

arXiv.org e-Print Archive

Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches.

Author: Crichton Gamal
Guo Yufan
Korhonen Anna-Leena
Pyysalo Sampo
Publication venue: BMC bioinformatics
Publication date: 01/05/2018
Field of study

Background: Link prediction in biomedical graphs has several important applications including predicting Drug-Target Interactions (DTI), Protein-Protein Interaction (PPI) prediction and Literature-Based Discovery (LBD). It can be done using a classifier to output the probability of link formation between nodes. Recently several works have used neural networks to create node representations which allow rich inputs to neural classifiers. Preliminary works were done on this and report promising results. However they did not use realistic settings like time-slicing, evaluate performances with comprehensive metrics or explain when or why neural network methods outperform. We investigated how inputs from four node representation algorithms affect performance of a neural link predictor on random- and time-sliced biomedical graphs of real-world sizes (∼6 million edges) containing information relevant to DTI, PPI and LBD. We compared the performance of the neural link predictor to those of established baselines and report performance across five metrics. Results: In random- and time-sliced experiments when the neural network methods were able to learn good node representations and there was a negligible amount of disconnected nodes, those approaches outperformed the baselines. In the smallest graph (∼15,000 edges) and in larger graphs with approximately 14% disconnected nodes, baselines such as Common Neighbours proved a justifiable choice for link prediction. At low recall levels (∼0.3) the approaches were mostly equal, but at higher recall levels across all nodes and average performance at individual nodes, neural network approaches were superior. Analysis showed that neural network methods performed well on links between nodes with no previous common neighbours; potentially the most interesting links. Additionally, while neural network methods benefit from large amounts of data, they require considerable amounts of computational resources to utilise them. Conclusions: Our results indicate that when there is enough data for the neural network methods to use and there are a negligible amount of disconnected nodes, those approaches outperform the baselines. At low recall levels the approaches are mostly equal but at higher recall levels and average performance at individual nodes, neural network approaches are superior. Performance at nodes without common neighbours which indicate more unexpected and perhaps more useful links account for this.This work was supported by Medical Research Council [grant number MR/M013049/1] and the Cambridge Commonwealth, European and International Trus

Directory of Open Access Journals

Apollo (Cambridge)

A Comparison and User-based Evaluation of Models of Textual Information Structure in the Context of Cancer Risk Assessment

Author: Guo Yufan
Hogberg Johan
Korhonen Anna
Liakata Maria
Silins Ilona
Stenius Ulla
Publication venue
Publication date: 08/03/2011
Field of study

BACKGROUND: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking. METHODS: We take three schemes of different type and granularity - those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC) - and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA. RESULTS: Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme. CONCLUSIONS: We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine

Aberystwyth Research Portal

PubMed Central

Warwick Research Archives Portal Repository

Apollo (Cambridge)

Recent Advances in Hypertrophic Cardiomyopathy: A System Review

Author: Abraham Maria R.
Guan Yufan
Guo Xiaofan
Jing Xiong
Li Zhao
Liu Yamin
Shao Hua
Zhang Xueli
Publication venue: 'IntechOpen'
Publication date: 06/09/2017
Field of study

Hypertrophic cardiomyopathy (HCM) is a common genetic cardiovascular disease present in 1 in 500 of the general population, leading to the most frequent cause of sudden death in young people (including trained athletes), heart failure, and stroke. HCM is an autosomal dominant inheritance, which is associated with a large number of mutations in genes encoding proteins of the cardiac sarcomere. Over the last 20 years, the recognition, diagnosis, and treatment of HCM have been improved dramatically. And moreover, recent advancement in genomic medicine, the growing amount of data from genotype-phenotype correlation studies, and new pathways for HCM help the progress in understanding the diagnosis, mechanism, and treatment of HCM. In this chapter, we aim to outline the symptoms, complications, and diagnosis of HCM; update pathogenic variants (including miRNAs); review the treatment of HCM; and discuss current treatment and efforts to study HCM using induced pluripotent stem cell–derived cardiomyocytes and gene editing technologies. The authors ultimately hope that this chapter will stimulate further research, drive novel discoveries, and contribute to the precision medicine in diagnosis and therapy for HCM

IntechOpen

Genetic variations in the DYNC2H1 gene causing SRTD3 (short-rib thoracic dysplasia 3 with or without polydactyly)

Author: Donglan Sun
Jiayu Yuan
Jing Zhang
Kai Yang
Qing Guo
Wenqi Chen
Yazhou Li
Ying Liang
Yufan Yuan
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2023
Field of study

Background and aims: Short-rib thoracic dysplasia 3 with or without polydactyly (SRTD3) represents a type of severe fetal skeletal dysplasia (SD) characterized by shortened limbs, narrow thorax with or without polydactyly, which is caused by the homozygous or compound heterozygous mutations in the DYNC2H1 gene. SRTD3 is a recessive disorder, identification of the responsible genetic variation would be beneficial to an accurate prenatal diagnosis and well-grounded counseling for the affected families.Material and methods: Two families having experienced recurrent fetal SDs were recruited and submitted to a multiplatform genetic investigation. Whole-exome sequencing (WES) was performed with samples collected from the probands. Sanger sequencing and fluorescent quantitative PCR (qPCR) were conducted as validation assays for suspected variations.Results: WES identified two compound heterozygous variations in the DYNC2H1(NM_001080463.2) gene, namely c.2386C>T (p.Arg796Trp) and c.7289T>C (p.Ile2430Thr) for one; and exon (64–83)del and c.8190G>T (p.Leu2730Phe) for the other, respectively. One variant in them, exon (64–83)del, was novelly identified.Conclusion: The study detected two compound heterozygous variation in DYNC2H1 including one novel deletion: exon (64–83) del. Our findings clarified the cause of fetal skeletal dysplasia in the subject families, provided guidance for their future pregnancies, and highlighted the value of WES in diagnosis of skeletal dysplasia with unclear prenatal indications

Directory of Open Access Journals